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SYSTEM AND METHOD FOR RATE-DISTORTION OPTIMIZED DATA 
PARTITIONING FOR VIDEO CODING USING PARAMETRIC RATE- 
DISTORTION MODEL 

5 

BACKGROUND OF THE INVENTION 

1 . Field Of The Invention 

The present invention is related to scalable video coding systems, in particular, the 
1 0 invention relates to a general rate-distortion optimized data partitioning (gRDDP) of 
discrete cosine transform (DCT) coefficients for video transmission over packet lossy 
network using a parametric rate-distortion (RD) model. 

2. . Description Of The Related Art 

15 Video is a sequence of pictures; each picture is formed by an array of pixels. The 

size of uncompressed video is huge. To reduce its size, video compression may be used 
to reduce the size and improve the data transmission rate. Various video coding methods 
(e.g., MPEG 1, MPEG 2, and MPEG 4) have been established to provide an international 
standard for the coded representation of moving pictures and associated audio on digital 

20 storage media. 

Such video coding methods format and compress the raw video data for reduced 
rate transmission. For example, the format of the MPEG 2 standard consists of 4 layers: 
Group of Pictures, Pictures, Slice, Macroblock. A video sequence begins with a sequence 
header that includes one or more groups of pictures (GOP), and ends with an end-of- 

25 sequence code. The Group of Pictures (GOP) includes a header and a series of one of 
more pictures intended to allow random access into the video sequence. 

The pictures are the primary coding unit of a video sequence. A picture consists of 
three rectangular matrices representing luminance (Y) and two chrominance (Cb and Cr) 
values. The Y matrix has an even number of rows and columns. The Cb and Cr matrices 

3 0 are one-half the size of the Y matrix in each direction (horizontal and vertical). The slices 
are one or more "contiguous" macroblocks. The order of the macroblocks within a slice is 
from left-to-right and top-to-bottom. 
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The macroblocks are the basic coding unit in the MPEG algorithm. The 
macroblock is a 1 6x1 6 pixel segment in a frame. Since each chrominance component has 
one-half the vertical and horizontal resolution of the luminance component, a macroblock 
consists of four Y, one Cr, and one Cb block. The Block is the smallest coding unit in the 
5 MPEG algorithm. It consists of 8x8 pixels and can be one of three types: luminance(Y), 
red chrominance (Cr), or blue chrominance(Cb). The block is the basic unit in intra frame 
coding. 

The MPEG 2 standard defines three types of pictures: Intra Pictures (I-Pictures) 
Predicted Pictures (P-Pictures); and Bidirectional Pictures (B-Pictures). Intra pictures, or 

1 0 I-Picture, are coded using only information present in the picture itself, and provides 

potential random access points into the compressed video data. Predicted pictures, or P- 
pictures, are coded with respect to the nearest previous I- or P-pictures. Like I-pictures, P- 
pictures also can serve as a prediction reference for B-pictures and future P-pictures. 
Moreover, P-pictures use motion compensation to provide more compression than is 

15 possible with I-pictures. Bidirectional pictures, or B-pictures, are pictures that use both a 
past and future picture as a reference. B-pictures provide the most compression since it 
uses the past and future picture as a reference. These three types of pictures are combined 
to form a group of picture. 

The MPEG transform coding algorithm includes the following coding steps: 

2 0 Discrete cosine transform (DCT), Quantization, and Run-length encoding . 

An important technique in video coding is scalability. In this regard, a scalable 
video codec is defined as a codec that is capable of producing a bitstream that can be 
divided into embedded subsets. These subsets can be independently decoded to provide 
video sequences of increasing quality. Thus, a single compression operation can produce 

2 5 bitstreams with different rates and reconstructed quality. A small subset of the original 

bitstream can be initially transmitted to provide a base layer quality with extra layers 
subsequently transmitted as enhancement layers. Scalability is supported by most of the 
video compression standards such as MPEG-2, MPEG-4 and H.263. 

An important application of scalability is in error resilient video transmission. 

3 0 Scalability can be used to apply stronger error protection to the base layer than to the 

enhancement layers (i.e., unequal error protection). Thus, the base layer will be 
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successfully decoded with high probability even during adverse transmission channel 
conditions. 

Data Partitioning (DP) is used to facilitate scalability. For example in MPEG 2, 
the slice layer indicates die maximum number of block transform coefficients contained in 
5 the particular bitstream (known as the priority break point). Data partitioning is a 

frequency domain method that breaks the block of 64 quantized transform coefficients 
into two bitstreams. The first, higher priority bitstream (e.g., base layer) contains the more 
critical lower frequency coefficients and side information (such as DC values, motion 
vectors). The second, lower priority bitstream (e.g., enhancement layers) carries higher 

1 0 frequency AC data. 

Figure 1 shows a block diagram illustrating data partitioning that may be 
implemented outside the encoder. At the transmitter, the demultiplexer receives from the 
variable length decoder (VLD) the number of bits used for each variable length code and 
separates the bitstream based on the priority break point (PBP) value. Note that the PBP's 

1 5 can be changed at each slice based on the rate partitioning logic used. In particular, in 

conventional DP video coders (e.g., MPEG), single layer bit stream is partitioned into two 
or more bit streams in the DCT domain. During transmission, one or more bit streams are 
sent to achieve bit rate scalability. Unequal error protection can be applied to base and 
enhancement layer data to improve robustness to channel degradation. 

2 0 Figure 2 shows a block diagram illustrating merging that may be implemented 

outside the decoder. As shown, two VLD's are used to process the base layer and 
enhancement layer streams and then output nonlayered bitstream. The PBP defines how 
an encoded bitstream is partitioned. Before decoding, depending on resource allocation 
and/or receiver capacity, the received bitstreams or a subset of them are merged into one 

2 5 single bitstream and decoded. 

The conventional DP structure has advantages in a home network environment. 
More specifically, at its full quality, the rate-distortion performance of the DP is as good 
as its single layer counterpart while rate scalability is also allowed. The rate-distortion (R- 
D) performance is concerned with finding an optimal combination of rate and distortion. 

3 0 This optimal combination, which could also be seen as the optimal combination of cost 

and quality, is not unique. R-D schemes attempt to represent a piece of information with 
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the fewest bits possible and at the same time in a way that will lead to the best 
reproduction quality. 

It is also noted that in the conventional DP structure, the additional decoding 
complexity overhead is very minimal at its full quality while the DP provides wider range 
of decoder complexity scalability. This is because variable length decoding (VLD) of 
DCT run-length pairs - which is the most computational extensive part - now becomes 
scalable. 

In the conventional DP structure, the DCT priority break point (PBP) value needs 
to be transmitted explicitly as side information. To minimize the overhead, the PBP value 
is usually fixed for all the DCT blocks within each slice or video packet. 

While the conventional DP method is simple and has some advantages, it is not 
capable of adapting base layer optimization because only one PBP value is used for all 
blocks within each slice or video packets, in addition, a prediction drift occurs at low bit 
rates as a result of the single-loop prediction structure used for data partitioning. Thus, it 
is difficult during data partitioning how to choose the DCT break point for each block 
such that the base station quality at a given base partition rate is optimal. In order to 
achieve a minimum distortion at the base layer, the partitioning point must be allowed to 
vaiy at the DCT block level. However, such a fine control of die breakpoint introduces 
significant rate overhead due to the explicit transmission of breakpoint values. 

Accordingly, there exists a need for video coding techniques that overcome the 
limitations of the conventional data partitioning scheme and provide improved base layer 
optimization. 

SUMMARY OF THE INVENTION 

The present invention addresses the foregoing need and provides additional 
advantages, by providing an improved data partitioning technique by employing a 
parametric RD model. In one embodiment of the present invention, this can be achieved 
with minimal overhead (« 20 bits for each slice or video packet or even for each frame) 
by employing context-based backward adaptation. 
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One aspect of the present invention is directed to a system and method that 
provide a rate-distortion optimized data partitioning (gRD-DP) of DCT coefficients for 
video transmission. 

In another aspect of the present invention, the RD-DP adapts the partition point 
5 block-by-block, hence greatly improves the coding efficiency of the base layer bit stream. 
This also allows a decoder to find the partition location in backward-fashion from the 
decoded data without explicit transmission, hence saving the bandwidth significantly. 

In yet another aspect of the present invention, a Lagrangian parameter X is 
calculated. The value of X is determined to meet the rate budget Rb (for the base layer 
10 transmission channel) using a standard one-dimensional bisection algorithm. 

One embodiment of the present invention is directed to a data partitioning method 
for a scalable video encoder. The method includes the steps of receiving video data; 
determining DCT coefficients for a plurality of macroblocks of a video frame; quantizing 
the DCT coefficients and converting the quantized DCT coefficients into (run, length) 
15 pairs; determining the slope of the parametric rate-distortion curve for each the plurality 
of macroblocks in the video frame, wherein if the slope is less than a or if the k-th slope 
is a first slope that is not less than a> write the k-th (run, length) pair into the base layer, 
otherwise if the k-th slope is greater than a , write the k-th (run, length) pair into the at 
least one enhancement layer, where a is determined in accordance with a Lagrangian 
2 0 calculation. 

Another embodiment of the present invention is directed to a method for 
determining a boundary between a base layer and at least one enhancement layer in a 
scalable video decoder. The method includes the steps of receiving the base layer and the 
at least one enhancement layer, the base layer and enhancement layer including data 

2 5 representing (run, length) pairs for a plurality of macroblocks in a video frame. For each 

the plurality of macroblocks in the video frame, determining the slope of the parametric 
rate-distortion curve If the slope is less than a or if the k-th slope is a first slope that is 
not less than a, read the k-th (run, length) pair from the base layer, otherwise if the k-th 
slope is greater than a , read the k-th (run, length) pair from the at least one enhancement 

3 0 layer, where a is determined in accordance with a Lagrangian calculation. 

Yet another embodiment of the present invention is directed to a scalable decoder 
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capable of merging data from a base layer and at least one enhancement layer. The 
decoder includes a memory which stores computer-executable process steps, and a 
processor which executes the process steps stored in the memory so as (i) receiving the 
base layer and the at least one enhancement layer, the base layer and enhancement layer 
including data representing (run, length) pairs for a plurality of macroblocks in a video 
fi-ame, (2) for each the plurality of macroblocks in the video frame, determining a 
parametric rate-distortion model, (3) computing the slope (tangent) of the parametric rate- 
distortion model at using k (run,length) pairs, for an i-th block , and (3) if the slope of 
the parametric model updated using k (run,length) pais is less than a or if the it is a first 
slope that is not less than a, read the k-th (run, length) pair from the base layer, otherwise 
if the the slope is greater than a , read the k-th (run, length) pair from the at least one 
enhancement layer, where a is determined in accordance with a Lagrangian calculation. 

Yet another embodiment of the present invention is directed to a scalable 
transcoder. A single layer coded video bitstream (MPEG-1, MPEG-2, MPEG-4, H.264, 
etc) is partially decoded and the bitstream splitting point is determined for each DCT 
block based on the forementioned boundary determining method embodiment. Afterwards 
the VLC codes are split into two or more partitions based on the splitting points. The 
partial decoding involves variable length decoding, inverse scanning and inverse 
quantization only. No inverse DCT or motion compensation is needed. 

The invention has particular utility in connection with variable-bandwidth 
networks and computer systems that are able to accommodate different bit rates, and 
hence different quality images. 



BRIEF DESCRIPTION OF TFTF, DRAWTNKrS 

Figures 1 and 2 are general block diagrams of a system for data partitioning and 
merging. 

Figure 3 depicts a video coding system in accordance with one aspect of the 
present invention. 

Figure 4 depicts a typical convex Rate-Distortion curve. 
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Figure 5 depicts a non-convec Rate-Distortion curve. 
Figure 6 depicts a computer system on which the present invention may be 
implemented. 

Figure 7 depicts the architecture of a personal computer in the computer system 
shown in Figure 6. 

Figure 8 depicts a block diagram of a transcoder in accordance with one 
embodiment of the present invention. 

DETAILED DKS CRff TION OF THE PREFERRED EMBODIMENTS 

Figure 3 illustrates a scalable video system 100 with layered coding and transport 
prioritization. A layered source encoder 110 encodes input video data. The output of the 
layered source encoder 1 10 includes a base layer 121 and one or more enhancement layers 
122-124. A plurality of channels 120 cany the output encoded data. A layered source 
decoder 130 decodes the encoded data. 

There are different ways of implementing layered coding. For example, in 
temporal domain layered coding, the base layer contains a bit stream with a lower frame 
rate and the enhancement layers contain incremental information to obtain an output with 
higher frame rates. In spatial domain layered coding, the base layer codes the sub- 
sampled version of the original video sequence and the enhancement layers contain 
additional information for obtaining higher spatial resolution at the decoder. 

Generally, a different layer uses a different data stream and has distinctly different 
tolerances to channel errors. To combat channel errors, layered coding is usually 
combined with transport prioritization so that the base layer is delivered with a higher 
degree of error protection. If the base layer 121 is lost, the data contained in the 
enhancement layers 122-124 may be useless. 

In one embodiment of the present invention, the video quality of the base layer 121 
is flexibly controlled at the DCT block level. The desired base layer can be controlled by 
adapting the break points at the DCT block level by employing parametric RD model to 
approximate the convex hull of the RD planes for each DCT blocks, thereby finding the 
optimal partitioning points synchronously at the encoder and decoder (explained later with 
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reference to Fibres 5 and 6). 

It is noted that the purpose of DCT is to reduce the spatial correlation between 
adjacent error pixels, and to compact the energy of the error pixels into a few coefficients. 
Because many high frequency coefficients are zero after quantization, variable length 
coding (VLC) is accomplished by a runlength coding method, which orders the 
coefficients into a one-dimensional array using a so-called zig-zag scan so that the low- 
frequency coefficients are put in front of the high-frequency coefficients. This way, the 
quantized coefficients are specified in terms of the non-zero values and the number of the 
preceding zeros. Different symbols, each corresponding to a pair of zero runlength, and 
non-zero value, are coded using variable length codewords. 

The scalable video system 100 preferably uses entropy coding. In entropy coding, 
quantized DCT coefficients are rearranged into a one-dimensional array by scanning them 
in a zig-zag order. This rearrangement puts the DC coefficient at the first location of the 
array and the remaining AC coefficients are arranged from the low to high frequency, in 
both the horizontal and vertical directions. The assumption is that the quantized DCT 
coefficients at higher frequencies would likely be zero, thereby separating the non-zero 
and zero parts. ITie rearranged array is coded into a sequence of the run-level pair. The 
run is defined as the distance between two non-zero coefficients in the array. The level is 
the non-zero value immediately following a sequence of zeros. This coding method 
produces a compact representation of the 8x8 DCT coefficients, since a large number of 
the coefficients have been already quantized to zero value. 

The run-level pairs and the information about the macroblock, such as the motion 
vectors, and prediction types, are further compressed using entropy coding. Both variable- 
length and fixed-length codes are used for this purpose. 

The design of the video system 100 is motivated by the operational rate-distortion 
(RD) theory. RD theory is useful in coding and compression scenarios, where the 
available bandwidth is known a priori and where the purpose is to achieve the best 
reproduction quality that can be achieved within this bandwidth (i.e., adaptive 
algorithms). 

Discussed below is an illustration formulated to solve for the optimized partitions 
(i.e., base and enhancement layer partitions). In the following discussion it is assumed 
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that there are * V DCT blocks for each video frame and the bit rate budget Rb is known 
for the base layer partition. The rate budget is determined based on the minimal video 
quality requirement and channel throughput fluctuation. Then, the following optimization 
problem can be formulated to solve for the optimal partitions: 

JP ir L2!i Di ( P 0 subject to YRiiPfyzRb n\ 

where Pi e {0,1,..., K(i)\ i = 1,...,* is the break point value for the i-th block and 
K(i) denotes the maximum (run, length) pairs in the i-th block, Ri(Pi) and Di(Pi) denote 
the corresponding bit rate and the distortion from the i-th block, respectively. 

The optimization problem can be solved using an iterative bisection algorithm 
based on a Lagrangian optimization. The optimal partitioning point Pi satisfies the 
following condition for all i=l,...,n: 

dDi(Pi) , „ 

where the Lagrangian X > 0 is determined by the standard bisection search so that 
the rate constraint in (1) is satisfied. 

If the k-th DCT (run, length) pair for the i-th block is L) bits and has a coefficient 

value of X f ; then, the slope for the rate-distortion (R-D) curve of the i-th block at the k-th 

DCT (run, length) pair has the following set of discrete values: 

dDi{Pi) Di(P w ) - r " D * r. v * ,2 1 *«> 



1 m 

dRi(Pi) /?,(/>. 



Referring now to Fig. 4, a convex R-D curve is shown to illustrate how to determine 
the partition point and how the layered source decoder 130 can infer the partition point in a 
backward-adaptive fashion. It is noted that the layered source decoder 130 operates in the 
same way even if R-D curse is not convex. 
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From Fig. 4, if the rate-distortion curve is convex it can seen that in general * is 
decreasing function with respect to R and therefore, in general, the following relationship 
holds: 

z! z? - - z*» (4) 

In accordance with Eq. (4) a partitioning algorithm for the DCT coefficients at the 
layered source encoder 1 10 side is given below if the rate-distortion curve is convex. It is 
noted that to get to this point, the video data for a frame is converting it using the discrete 
10 cosine transform (DCT), the DCT coefficients are quantized, and then converted into 
binary codewords (run, length) using variable length coding (VLC). 

for i=l,...,n { for each macroblock in frame 

for k=l,. . .,K(i) { for each (run, length) pair 
1 5 Compute the corresponding X* ,1* . 

Put the k-th (run, length) VLC into base layer, 
if \Xf\ 2 !L k { <X break; 

} 

put the remaining (run, length) pairs of i-th block into ENH layer. 

20 } 



The Lagrangian parameter * may be separately encoded and transmitted as side 
information (i.e., overhead information). The layered source decoder 130 can find the 
25 boundary of the base layer 121 and enhancement layer 122, as well as, find the 
synchronization using the following algorithm: 

for i=l,. . . ,n { for each macroblock in frame 

for k=l v . .,K(i) { for each (run, length)pair 
3 0 Read VLC (run, length) pair from base layer. 

Compute the corresponding X? X t . 
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if \X?\ 2 IL\<X break; 

} 

Read the remaining (run, length) pairs of I-th block from ENH layer. 

} 

As discussed above, the only side information to be transmitted is the Lagrangian 
parameter X . The value of X is determined to meet the rate budget Rb of Eq.(l) using 
the standard the one-dimensional bisection algorithm. However, the optimal value of X 
can be a real number and should be quantized for transmission over the channel 120. 

In practical implementation of variable length coding for the (run, length) pair, 
however, the R-D curve of Fig. 4 may be non-convex, as shown in Fig. 5, as the VLC is 
only an approximation of the true entropy of the source. In that case, the test variable 
I x i I 2 / L) is no longer monotonic with respect to k. In this case, the partitioning rule 
given by Eq.(4) is not valid and the near-optimality of RDDP can be broken, as shown in 
FIG. 5. Note that the optimal breakpoint value may be k 2 while the RDDP algorithm 
provides ki, which makes the base layer under-partitioned. 

Accordingly, in a preferred embodiment, the convex hull is approximated using a 
parametric model which is continuously being updated at the encoder and decoder 
simultaneously using previously decoded (run, length) pairs. 

More specifically, in a preferred embodiment, the following partitioning rule: 

aP # (^»);g f (ifc)) | f >X , k<B t 

~ dR t (k) I { ZX , k>B t (5) 

where D f (R;&) denotes the i-th blockJbase layer distortion model with respect to the 
rate R with a parameter vector 6 i , R.(k) denotes the rate if k-(run, level) pairs are included, 
and 0. (k) is an estimated parameter-far the-i-tlrbiock using k-(run, level) pairs. 

In Eq.(5), any rate distortion model can be used as long as it is convex and 
monotonically decreasing function. For example, an exponential distortion model may be 
used: 
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D(R;0) = <r 2 exp(-aR) (6) 
where 6 = {a, a) is the unknown parameter vector to be estimated. 

5 

For the distortion model Eq.(6), the partitioning rule becomes: 

<r 2 (k)a(k)exp(-a(k)R t (k)) \ >X ' k * B * 

\ <X , k>B i 

1 0 where a{k\a{k) are estimated parameter using the k-(run,level) VLC pairs. 

Accordingly, the layered source decoder 130 can find the boundary of the base layer 
121 and enhancement layer 122, as well as, find the synchronization using the following 
algorithm to split the bit-stream nearly optimally without sending explicit information of the 
15 breakpoint values: 



20 



Encoding: 



Encode X into base partition. 

for 1=1,.. .,N {// for each DCT blocks 

for k=l v ..,K(I) {//for each (runjevel) pair 
Compute C,(Jt) and L,(k). 

25 Estimate <?,(*) using and {L,(m)} k mssl and update the 

parametric distortion function Di(Ri(k), #,(£)) 

Put the k-th (run,level) VLC into base partition. 



if 



< Z break. 



end 

3 0 Put the remaining (run, level) pairs into enhancement partition, 

end 
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Decoding: 

Decode X from base partition. 

for I=1,...,N { // for each DCT blocks 

for k=l,. ..,K(I) { //for each (r un,level) pair 

Read the k-th (run.Ievel) VLC from base partition. 

Compute C,(k) and L t (k). 

Estimate 9,Qc) using fc(™)L and fo(„)}* =] {L,( m )}L and update the 
parametric distortion function Di(Ri(k), 0,(k)) 



If 



ao, (*,(*);<?,(*)) 



dR.(k) 



<X break. 



end 



end 

Read the remaining (run, level) pairs from enhancement partition. 



As explained above, the only side information to be transmitted is the Lagrangian 
parameter X . The value of X is determined to meet the rate budget Rb of Eq.(l) using the 
standard the one-dimensional bisection algorithm. Then, it is quantized andtransmitted once 
for each frame header, hence the rate overhead is negligible. 

Therefore, by transmitting the X value and the corresponding low frequency and 
some high frequency DCT coefficients (as the base layer 121) over a more reliable 
transmission channel, greater dynamic allocation of the DCT information is achievable. 
This allows for more control of the minimal quality of the video in case data from one or 
more of the enhancement layers 122-124 is lost. 

Furthermore, the parametric model approximates the convex hull of the rate 
distortion curve, hence preventing under-partitioning from occurring even in non-convex 
rate-distortion function cases. 

The embodiments of me present invention discussed above are applicable to any 
scalable video coding system, e.g., MPEG 2, jvIPEG 4, H.263, etc. 

Figure 6 shows a representative embodiment of a computer system 9 on which the 
present invention may be implemented. As shown in Figure 3, personal computer C'PC") 
10 includes network connection 1 1 for interfacing to a network, such as a variable- 
bandwidth network or the Internet, and fax/modem connection 12 for interfacing with 
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other remote sources such as a video camera (not shown). PC 10 also includes display 
screen 14 for displaying information (including video data) to a user, keyboard 15 for 
inputting text and user commands, mouse 13 for positioning a cursor on display screen 14 
and for inputting user commands, disk drive 1 6 for reading from and writing to floppy 
disks installed therein, and CD-ROM drive 17 for accessing information stored on CD- 
ROM. PC 10 may also have one or more peripheral devices attached thereto, such as a 
scanner (not shown) for inputting document text images, graphics images, or the like, and 
printer 19 for outputting images, text, or the like. 

Figure 7 shows the internal structure of PC 10. As shown in Figure 7, PC 10 
includes memory 20, which comprises a computer-readable medium such as a computer 
hard disk. Memory 20 stores data 23, applications 25, print driver 24, and operating 
system 26. In preferred embodiments of the invention, operating system 26 is a 
windowing operating system, such as Microsoft Windows2000; although the invention 
may be used with other operating systems as well. Among the applications stored in 
memory 20 are scalable video coder 21 and scalable video decoder 22. Scalable video 
coder 21 performs scalable video data encoding in the manner set forth in detail below, 
and scalable video decoder 22 decodes video data that has been coded in the manner 
prescribed by scalable video coder 21. 

Also included in PC 10 are display interface 29, keyboard interface 30, mouse 
interface 3 1 , disk drive interface 32, CD-ROM drive interface 34, computer bus 36, RAM 
37, processor 38, and printer interface 40. Processor 38 preferably comprises a 
microprocessor or the like for executing applications, such those noted above, out of 
RAM 37. Such applications, including scalable video coder 21 and scalable video 
decoder 22, may be stored in memory 20 (as noted above) or, alternatively, on a floppy 
disk in disk drive 16 or a CD-ROM in CD-ROM drive 17. Processor 38 accesses 
applications (or other data) stored on a floppy disk via disk drive interface 32 and accesses 
applications (or other data) stored on a CD-ROM via CD-ROM drive interface 34. 

Application execution and other tasks of PC 4 may be initiated using keyboard 15 
or mouse 13, commands from which are transmitted to processor 38 via keyboard 
interface 30 and mouse interface 31, respectively. Output results from applications 
running on PC 10 may be processed by display interface 29 and then displayed to a user 
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on display 14 or, alternatively, output via network connection 1 1. For example, input 
video data which has been coded by scalable video coder 21 is typically output via 
network connection 11. On the other hand, coded video data received from, e.g., a 
variable bandwidth-network is decoded by scalable video decoder 22 and then displayed 
on display 14. To this end, display interface 29 preferably comprises a display processor 
for forming video images based on decoded video data provided by processor 38 over 
computer bus 36, and for outputting those images to display 14. Output results from other 
applications, such as word processing programs, running on PC 10 may be provided to 
printer 19 via printer interface 40. Processor 38 executes print driver 24 so as to perform 
appropriate formatting of such print jobs prior to their transmission to printer 19. 

Another embodiment of the present invention is directed to a scalable transcoder. 
As shown in Fig. 8, a single layer coded video bitstream 200 (MPEG-1, MPEG-2, MPEG- 
4, H.264, etc) is partially decoded by a variable length decoder 210. The DCT coefficient 
220 are sent to an inverse scan/quantization unit 230 and then to a partitioning line finder 
240. The bitstream splitting point is determined for each DCT block based on the 
boundary determining method embodiment discussed above. Afterwards VLC codes 250 
are split into two or more partitions based on the splitting points. The results are provided 
to a variable length code buffer 260. In accordance with the embodiment, the partial 
decoding involves variable length decoding, inverse scanning and inverse quantization 
only. No inverse DCT or motion compensation is needed 

Although the embodiments of the invention described herein are preferably 
implemented as computer code, all or some of the embodiments discussed above can be 
implemented using discrete hardware elements and/or logic circuits. Also, while the 
encoding and decoding techniques of the present invention have been described in a PC 
environment, these techniques can be used in any type of video devices including, but not 
limited to, digital televisions/set top boxes, video conferencing equipment, and the like. 

In this regard, the present invention has been described with respect to particular 
illustrative embodiments. For example, principles of the present invention as described in 
the embodiments above may also be applied to partition enhancement layers. It is to be 
understood that the invention is not limited to the above-described embodiments and 
modifications thereto, and that various changes and modifications may be made by those 
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of ordinary skill in the art without departing from the spirit and scope of the appended 
claims. 
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WHAT IS CLAIMED IS : 



10 



1 . A method for partitioning data for a scalable video encoder, the 
method comprising the steps of: 
receiving video data; 

determining DCT coefficients for a plurality of macroblocks of a video 

frame; 

quantizing the DCT coefficients; 

converting the quantized DCT coefficients into (run, length) pairs; and 
for each the plurality of macroblocks in the video frame, determining a ratio 

dD i( Rf ( k ^ 0i ( k ^ f where a D t {R\0) represents a distortion model for an i-th 



block, R { (k) represents a rate for a k-(run, level) pair, and 9 { {k) represents an 
estimated parameter for the i-th block using a k-(run, level) pair, and 



if 



dR f {k) 



is less than A or if 



is a first 



1 5 ration that is not less than X> putting the k-th (run, length) pair into a base layer, 

8A(*,(*);3(*)) 



otherwise if 



is greater than A , putting the k-th (run, length) 



dR,(k) 

pair into an enhancement layer, where A is determined in accordance with a 
Lagrangian calculation. 



20 2. The method according to Claim 1, further comprising the step of 

transmitting the base and enhancement layers over different transmission 
channels. 



3. The method according to Claim 1, wherein scalable video encoder 
25 is an MPEG 4 encoder. 
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4. The method according to Claim 1, wherein scalable video encoder 
is an H.263 encoder. 

5. The method according to Claim 1, wherein scalable video encoder 
is an MPEG 2 encoder. 



6. The method according to Claim 1, wherein scalable video encoder 
is a video encoder which has DCT transform and entropy coding. 

7. The method according to Claim 1, wherein scalable video encoder 
is realized by transcoding single layer MPEG2, MPEG4, and H.26L. 

8. The method according to Claim 1, further comprising the step of 
quantizing A and transmitting the quantized value as side information to a 
decoder. 

9. The method according to Claim 6, wherein the side information is 
sent only once in a frame header for the video frame. 



10. The method according to Claim 6, wherein the side information 
can be sent to a slice header or a video packet header to improve robustness. 

11. The method according to Claim 1, wherein A is determined to meet 
a rate budge for a transmission channel for the base layer using a bisection 

. algorithm. 

12. The method according to Claim 1, wherein A is determined to meet 
a rate budge for a transmission channel for the base layer using an adaptive 
algorithm. 
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13. A method for determining a boundary between a base layer and at 
least one enhancement layer in a scalable video decoder, the comprising the steps 
of: 

receiving the base layer and the at least one enhancement layer, the base 
layer and enhancement layer including data representing (run, length) pairs for a 
plurality of macroblocks in a video frame; 

for each the plurality of macroblocks in the video frame, determining a ratio 
aD,(4(*);0,(*)) 

_ where a D^Rtf) represents a distortion model for an i-th 



dR t {k) 

block, R.(k) represents a rate for a k-(run, level) pair, and 0. (k) represents an 
estimated parameter for the i-th block using a k-(run, level) pair, and 



if 



dD^RXkyAm 



dR,{k) 



is less than A or if 



aP f (*,(*);g f (*)) 



dR t {k) 



is the first 



ration that is not less than A, read the k-th (run, length) pair from the base layer, 



otherwise if the ratio 



aPj(*,(*);3(*)> 



dR,(k) 



is greater than A , read the k-th (run, 



length) pair from the at least one enhancement layer, where A is determined by 
decoding side information. 



14. The method according to Claim 13, further comprising the step of 
receiving the base layer and enhancement layer over different transmission 
channels. 



15. The method according io ^iaim 13, wherein scalable video decoder 
in an MPEG 4 decoder. 

16. The method according to Claim 13, wherein scalable video decoder 
in an H.263 decoder. 
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17. The method according to Claim 13, wherein scalable video decoder 
in an MPEG 2 decoder. 

18. The method according to Claim 13, wherein scalable video decoder 
in a video decoder that uses DCT and entropy coding. 

19. The method according to Claim 13, wherein scalable video decoder is 
realized by a merger in front of a single layer video decoder selected from the 
group consisting of an MPEG2, MPEG4, and H.26L decoder. 

20. The method according to Claim 13, further comprising the step of 
receiving A as side information associated with the video frame. 

21 . The method according to Claim 20, wherein the side information is 
sent only once in a frame header for the video frame. 

22. The method according to Claim 20, wherein the side information is 
copied for each slice header or video packet header to increase robustness. 

23. The method according to Claim 13, wherein A is determined to 
meet a rate budge for a transmission channel for the base layer. 
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24. A scalable decoder capable of merging data from a base layer and 
at least one enhancement layer, comprising: 

a memory which stores computer-executable process steps; and 
a processor which executes the process steps stored in the memory so as (i) 
receiving the base layer and the at least one enhancement layer, the base layer and 
enhancement layer including data representing (ran, length) pairs for a plurality of 
macroblocks in a video frame, and (2) for each the plurality of macroblocks in the 

ap,w*);g,(*)) 
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video frame, determining a ratio 



where a D t (R;0) represents 



dR,{k) 

a distortion model for an i-th block, R, (£) represents a rate for a k-(run, level) pair, 
and O i (k) represents an estimated parameter for the i-th block using a k-(run, level) 



pair, and (3) if 



aP,(*,(*);fl,(*)) 



dR.(k) 



is less than A or if 



aD,(4,(*);g,(*)) 



dR t {k) 



is a first 



ratio that is not less than A 9 read the k-th (run, length) pair from the base layer, 



otherwise if 



s is greater than A , read the k-th (run, length) pair from 

the at least one enhancement layer, where A is determined in accordance with a 
Lagrangian calculation. 



25. The decoder according to Claim 24, wherein A is received by the 
decoder as side information associated with the video frame and the side 
information is sent only once in a frame header for the video frame. 

26. The decoder according to Claim 24, wherein A is determined to 
meet a rate budge for a transmission channel for the base layer. 
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ABSTRACT 

A system and method are disclosed that provide a simple and efficient 
layered video coding technique using a parametric fate-distortion (RD) model. 
The video coding system may include an rate-distortion optimized data 
partitioning encoder and decoder. The generalized RD-DP encoder adapts the 
partition point block-by-block which greatly improves the coding efficiency of the 
base layer bit stream without explicit transmission thereby saving the bandwidth 
significantly. Furthermore, even for the non-parametric rate-distortion curves, the 
parameteric rate-distortion model prevents the underpartitioning of the base-layer 
from happening, and the parametric model is simultaneously being updated at the 
encoder and decoder for synchronization. 



-22- 



Non- 
Layered 
Bitstream 



Demux 


<4 — 







Partitioning 
Contol Unit 



Base Enh. 
layer layer 



Figure 1 



Base Layer Enh. Layer 

1 i_ 



VLD1 



VLD2 

T~ 



MUX 



Non-Layered Bitstream 



Figure 2 



12 ^ ^ ^ O^i^S 3 




FIG. $ 



•BH%I74-7 * 0*1-31=813 3 



1° 

X 




CO-ROM 
INTERFACE 



PRINTER 
INTERFACE 



RAM 



IT 
yi 



20 - 



2H< 
21, 

23- 



MEMORY 



PRINT DRIVER 



OPERATING SYSTEM 



DATA 



APPLICATIONS 



SCALABLE VIDEO CODER 



SCALABLE VIDEO DECODER 



FIG. 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record. 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

^^m^ACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

^^JNES OR MARKS ON ORIGINAL DOCUMENT 
^^EFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



